Map Matching and Trajectory Analysis

ABSTRACT

A trajectory may be derived from noisy location data by mapping candidate locations for a user, then finding a match between successive locations. Location data may come from various sources, including telecommunications networks. Telecommunications networks may give location data based on observations of users in a network, and such data may have many inaccuracies. The observations may be mapped to physical constraints, such as roads, pathways, train lines, and the like, as well as applying physical rules such as speed analysis to smooth the data and identify outlier data points. A trajectory may be resampled or interpolated to generate a detailed set of trajectory points from a sparse and otherwise ambiguous dataset.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims benefit of and priority to PCT/IB2017/050891filed 17 Feb. 2017 by DataSpark, PTE, LTD entitled “Mobility Gene forTrajectory Data,” PCT/IB2017/050892 filed 17 Feb. 2017 by DataSpark,PTE, LTD entitled “Mobility Gene for Visit Data,” PCT/SG2017/050485filed 27 Sep. 2017 by DataSpark, PTE, LTD entitled “Trajectory AnalysisWith Mode Of Transport Analysis,” and PCT/SG2017/050484 filed 27 Sep.2017 by DataSpark, PTE, LTD entitled “Map Matching and TrajectoryAnalysis,” PCT/SG2018/050006 filed 5 Jan. 2018 by DataSpark, PTE, LTDentitled “Trajectory Analysis Through Fusion of Multiple Data Sources,”PCT/SG2018/050068 filed 14 Feb. 2018 entitled “Stay And TrajectoryIdentification From Historical Analysis of Communications NetworkObservations,” PCT/SG2018/050070 filed 14 Feb. 2018 by DataSpark, PTE,LTD entitled “Real Time Trajectory Identification From CommunicationsNetwork Observations,” the entire contents of which are hereby expresslyincorporated by reference for all they teach and disclose.

BACKGROUND

Mobility data is being gathered on a tremendous scale. Every cellulartelephone connection to every mobile device generates some data about auser's location. These observations are being generated at anastonishing rate, but the sheer volume of the observations make the datadifficult to analyze.

Mobility data can be generated by merely observing a location for adevice connected to a wireless network. The wireless network may be acellular network, but also may be any other network from which a devicemay be observed. For example, a WiFi router or BlueTooth device maypassively observe nearby devices, and may note the device's variouselectronic identification or other signatures. In many cases, a devicemay establish a communications session with various network accesspoints, which may indicate the device's location.

Many interesting uses come from analyzing mobility data. As merely oneexample, traffic congestion may be observed from aggregating mobilityobservations from cellular telephones.

As more and more uses for mobility data are developed, the complexitiesof analyzing and managing these large data sets are exploding. One issueis that the sources of the data, such as the telecommunicationscompanies, may have obligations of privacy and anonymity, but there maybe a large number of consumers of the data. The consumers may be a widerange of companies which may use the data in countless ways.

SUMMARY

A trajectory may be derived from noisy location data by mappingcandidate locations for a user, then finding a match between successivelocations. Location data may come from various sources, includingtelecommunications networks. Telecommunications networks may givelocation data based on observations of users in a network, and such datamay have many inaccuracies. The observations may be mapped to physicalconstraints, such as roads, pathways, train lines, and the like, as wellas applying physical rules such as speed analysis to smooth the data andidentify outlier data points. A trajectory may be resampled orinterpolated to generate a detailed set of trajectory points from asparse and otherwise ambiguous dataset.

Mobility observations may be analyzed to create so-called mobilitygenes, which may be intermediate data forms from which various analysesmay be performed. The mobility genes may include a trajectory gene,which may describe a trajectory through which a user may have travelled.The trajectory gene may be analyzed from raw location observations andprocessed into a form that may be more easily managed. The trajectorygenes may be made available to third parties for analysis, and mayrepresent a large number of location observations that may have beencondensed, smoothed, and anonymized. By analyzing only trajectories, athird party may forego having to analyze huge numbers of individualobservations, and may have valuable data from which to make decisions.

A visit mobility gene may be generated from analyzing raw locationobservations and may be made available for further analysis. The visitmobility gene may include summarized statistics about a certain locationor location type, and in some cases may include ingress and egresstravel information for visitors. The visit mobility gene may be madeavailable to third parties for further analysis, and may represent aconcise, rich, and standardized dataset that may be generated fromseveral sources of mobility data.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings,

FIG. 1 is a diagram illustration of an example embodiment showing anecosystem with mobility genes.

FIG. 2 is a diagram illustration of an embodiment showing a networkenvironment with systems for generating mobility genes.

FIG. 3 is a flowchart illustration of an embodiment showing a method forcollecting data by a telecommunications network.

FIG. 4 is a flowchart illustration of an embodiment showing a method forrequesting and responding to a customized mobility gene order.

FIG. 5 is a flowchart illustration of an embodiment showing a method forgenerating and responding to a standardized mobility gene order.

FIG. 6 is a flowchart illustration of an embodiment showing a method forgenerating a trajectory mobility gene.

FIG. 7 is a flowchart illustration of an embodiment showing a method forpreparing trajectory mobility genes for transmittal.

FIG. 8 is a flowchart illustration of an embodiment showing a method forprocessing trajectories into visit mobility genes.

FIG. 9 is a flowchart illustration of an embodiment showing a method forprocessing raw location observations into visit mobility genes.

FIG. 10 is a diagram illustration of an embodiment showing steps tocreate a path associated with a trajectory.

FIG. 11 is a diagram illustration of an embodiment showing a pathgenerated from a set of locations.

FIG. 12 is a diagram illustration of an embodiment showing a networkenvironment with a system that calculates a physical path from atrajectory.

FIG. 13 is a flowchart illustration of an embodiment showing a methodfor generating a transportation graph.

FIG. 14 is a diagram illustration of an example method for creatingcandidate locations and determining an optimized path through thecandidates.

DETAILED DESCRIPTION

Trajectory Analysis from Sparse Location Data

A detailed trajectory may be derived from a sparse and noisy set ofsequential location points. For each location point in time, a set ofcandidate physical locations may be generated from a map of the physicalarea, then the candidate physical locations may be connected to form atrajectory or path for a user.

A location dataset may include a location and timestamp for a specificdevice or user. In many cases, the set of location data points may benoisy. In many cases, location data supplied from a telecommunicationsor cellular network may provide location data that may have a highdegree of inaccuracy. One such example may be Location Based Service(LBS) location data.

Some telecommunications networks may provide a location data point asmerely the location of the cellular tower to which a user may beconnected, even though the user may be located a large distance from thetower. Such datasets may have a large accuracy tolerance, and the actualphysical location may be anywhere within the covered area of a cellulartower.

Further compounding the inaccuracies of the data, cellular networks mayhave various rollover or handoff mechanisms that may be deployed forload balancing. For example, a user may attempt to connect to a networkwith a mobile device, but the closest cellular tower may be nearingcapacity. In such a case, the user's device may be connected to a moredistant tower with available capacity. Such a situation may result in auser's location data reflecting a more distant tower.

In another example of inaccuracies in location data, many cellularnetworks may support several different communication bands andcommunication technologies. A user may have an older device that may notsupport the newest communication protocols, so their connection may besupplied by one set of towers while another user with a more advancedmobile device may be connected to a different set of towers, even thoughboth users may be in the same physical location. In such a situation,both users are physically located in the same space, but their locationdata may be different.

The analysis of such noisy and ambiguous location data may begin byidentifying candidate physical locations for each location data point.The physical locations may be locations on streets, sidewalks, roads,highways, train tracks, train stations, bus stations, and other physicallocations. Once candidate physical locations have been mapped, ananalysis may be performed to find a logical physical location that auser may have traversed. In a simple example, a logical physicalsequence may be to have traversed a roadway in a car or bicycle.

The analysis may further refine a sequence of physical locations into atrajectory by identifying any outliers or inconsistent location points.Such inconsistencies may be identified by impractical or physicallyimpossible changes in speed or direction, by illogical traffic routing,or other inconsistencies. In such cases, inconsistent data may beremoved from the trajectory. In some cases, the location sequence may berecalculated with the inconsistent data removed or de-emphasized.

Once a trajectory may be established, the trajectory may be resampled orinterpolated between the established data points. Such a process may addlocation data points to a trajectory to make the trajectory more usefulfor subsequent analyses.

Mobility Genes as Representations of Location Observations

Mobility genes may represent large numbers of location observations intoa compact, meaningful, and easily digestible dataset for subsequentobservations. The mobility genes may be one way for telecommunicationsservice providers may aggregate and process their location observationsinto various formats that may be sold and consumed by other companies toprovide meaningful and useful analyses.

The mobility genes may be a second tier of raw location data. Rawlocation data may come in enormous quantities, the volume of which maybe overwhelming. By condensing the raw location data into differentmobility genes, the subsequent analyses may be much more achievable,while also maintaining anonymity of the users whose observations may beprotected by convention or law.

Raw location data may be produced in enormous volumes. In modernsociety, virtually every person has at least one cellular telephone orother connected device. The devices continually ping with a cellularaccess point or tower, where each ping may be considered a locationobservation. In a single day in a medium sized city, billions oflocation observations may be collected.

Making meaningful judgments from these enormous datasets can becomputationally expensive. In many cases, small samples of the largerdataset may be used to estimate various factors from the data.

By pre-processing the raw location observations into a set of mobilitygenes, a data provider may make these enormous datasets available forfurther analysis without the huge computational complexities. In manycases, the mobility genes may be anonymized, smoothed, augmented withadditional data, and may be succinct enough and rich enough to makemeaningful analyses without violating a telecommunications network'sobligation of privacy to their customers. Further, the pre-processing ofthe data into mobility genes may transfer much of the computational costto the data provider, which may unburden the data consumers fromexpensive data handling.

Mobility Gene for Trajectory Data

Location observations may be condensed into trajectory data that may bemade available for various secondary analyses. Location observations maycome from many different sources, including location observations madeby telecommunications companies, such as cellular telephony providers,wireless access providers, and other communications providers.

The trajectory data may be useful for many different analyses, such astraffic patterns, behavioral studies, customer profiling, commercialreal estate analyses, anomaly detection, and others. The trajectorymobility gene may condense millions or billions of location observationsinto a form that may be easily digested into meaningful analyses anddecisions.

The mobility gene may represent a mechanism by which a data supplier maydigest large numbers of observations into a dense, useful, and anonymousformat that may be consumed by a third party. The third party may be aseparate company that may further process the mobility gene into adecision-making tool for various applications.

By using a mobility gene, a data provider, such as a telecommunicationsservice provider, may be able to pre-process large numbers of data intoan intermediate format for further analysis. The mobility gene may be aformat for making data available through an application programminginterface (API) or some other mechanism.

The trajectory mobility gene condenses many location observations into aseries of points or trajectories where a device was observed. Thispre-processing may increase the value of the trajectory data, as well asmake the trajectory data easier to analyze and digest. In many cases,the pre-processing may also attach various demographic information aboutthe users associated with the trajectories.

The trajectories may be smoothed, which may be useful in cases where theobservations may have location or time variations or tolerances. Forexample, many location observations may be made using an access pointlocation or some form of triangulation between multiple access points.Such location observations may have an inherent level of tolerance oruncertainty, which may lead to trajectories that may be physicallyimpossible, as the speed between each point may be unattainable usingconventional transportation mechanisms.

Demographic information about the users may be added to the trajectorydata. In many cases, a data provider may have secondary informationabout a user, such as the user's gender, actual or approximate age, homeand work locations, actual or approximate income, family demographics,and other information. Such demographics may be associated with eachtrajectory, and may be used for supplying subsets of trajectories forthird party analysis.

Trajectories may be anonymized in some cases. A user's trajectory mayreveal certain personally identifiable information (PII) about a user.For example, a user's commuting trajectory may identify the user's homeand work locations. With such information, a specific user may beidentified. Anonymization of this data may be performed in severaldifferent ways.

One way to anonymize a trajectory may be to truncate the trajectory toomit an origin, destination, or both, while keeping a portion of atrajectory of interest. For example, a set of trajectories may betruncated to only show movement trajectories through a specific portionof a road or train station. Such truncations may omit the user's originand destinations, but may give a third traffic analysis servicemeaningful and useful trajectories from which the service may show localtraffic patterns.

Another way to anonymize a trajectory may be to generalize or randomizean origin or destination of a trajectory. In many cases, a trajectorymay have location observations with a certain accuracy range ortolerance. Such accuracy may help identify a person's home or otherdestination very specifically. One way to anonymize the trajectory maybe to identify an origin or destination with a general area, such as acentroid of a housing district. All trajectories beginning or ending atthe housing district may be assigned to be the centroid of the housingdistrict, and thereby an individual trajectory cannot be used toidentify a specific resident of the housing district.

Mobility Gene for Visit Data

A mobility gene for visits may be one mechanism to aggregate andcondense location observations into an intermediate form for furtheranalysis. A visit gene may represent summarized location data thatreflect user behavior with respect to a certain location or locationtype.

The visit mobility gene may be derived from telecommunicationsobservations and other sources, and may be an intermediate form ofprocessed data that may be made available to third parties for analysis.In many cases, the visit mobility gene, as well as other mobility genes,may be made available for sale or consumption by third parties, and maybe a revenue source for telecommunications companies and other companiesthat may gather location observations.

A visit mobility gene may represent a rich set of data that may bederived from location observations. In many cases, a visit mobility genemay represent movements relating to a specific location, such as a trainstation, store, recreational location, or some other specific location.In some cases, a visit mobility gene may represent an aggregation ofvisits to a specific type of location, such as a user's home, work, orrecreational location.

A visit may be determined by a user's location observations beingconstant or within a certain radius for a period of time. In some cases,a visit may be derived by analyzing location observations to find alllocation observations that may be within a specific area, then analyzinguser's behavior to determine if the users remained in the area for aperiod of time. In other cases, a visit may be derived by computing auser's trajectory and analyzing the trajectory for periods where theuser's movements have stopped or remain within a small area. In suchcases, a visit mobility gene may be a secondary analysis of a trajectorymobility gene.

A visit gene may include time of day, length of stay, and various otherstatistics. A visit gene may also include information before and after aperson's visit. For example, a visit gene may include trajectoriesbefore and after a person's visit to a location. A visit gene may besupplemented with demographic information about visitors, such as actualor approximate age, gender, actual or approximate home and worklocations, actual or approximate income, as well as hobbies, commonother locations visited, and other information.

Throughout this specification, like reference numbers signify the sameelements throughout the description of the figures.

In the specification and claims, references to “a processor” includemultiple processors. In some cases, a process that may be performed by“a processor” may be actually performed by multiple processors on thesame device or on different devices. For the purposes of thisspecification and claims, any reference to “a processor” shall includemultiple processors, which may be on the same device or differentdevices, unless expressly specified otherwise.

When elements are referred to as being “connected” or “coupled,” theelements can be directly connected or coupled together or one or moreintervening elements may also be present. In contrast, when elements arereferred to as being “directly connected” or “directly coupled,” thereare no intervening elements present.

The subject matter may be embodied as devices, systems, methods, and/orcomputer program products. Accordingly, some or all of the subjectmatter may be embodied in hardware and/or in software (includingfirmware, resident software, micro-code, state machines, gate arrays,etc.) Furthermore, the subject matter may take the form of a computerprogram product on a computer-usable or computer-readable storage mediumhaving computer-usable or computer-readable program code embodied in themedium for use by or in connection with an instruction execution system.In the context of this document, a computer-usable or computer-readablemedium may be any medium that can contain, store, communicate,propagate, or transport the program for use by or in connection with theinstruction execution system, apparatus, or device.

The computer-usable or computer-readable medium may be, for example butnot limited to, an electronic, magnetic, optical, electromagnetic,infrared, or semiconductor system, apparatus, device, or propagationmedium. By way of example, and not limitation, computer readable mediamay comprise computer storage media and communication media.

Computer storage media includes volatile and nonvolatile, removable andnon-removable media implemented in any method or technology for storageof information such as computer readable instructions, data structures,program modules or other data. Computer storage media includes, but isnot limited to, RAM, ROM, EEPROM, flash memory or other memorytechnology, CD-ROM, digital versatile disks (DVD) or other opticalstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or any other medium which can be used tostore the desired information and which can accessed by an instructionexecution system. Note that the computer-usable or computer-readablemedium could be paper or another suitable medium upon which the programis printed, as the program can be electronically captured, via, forinstance, optical scanning of the paper or other medium, then compiled,interpreted, of otherwise processed in a suitable manner, if necessary,and then stored in a computer memory.

When the subject matter is embodied in the general context ofcomputer-executable instructions, the embodiment may comprise programmodules, executed by one or more systems, computers, or other devices.Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Typically, the functionalityof the program modules may be combined or distributed as desired invarious embodiments.

FIG. 1 is an illustration showing an example embodiment 100 of anecosystem with mobility genes. A mobile device 102 may connect tovarious access points 104, which may be managed by a network operator106. Each communication with the mobile device 102 may be stored as rawlocation data 108.

A location data processor 110 may analyze the raw location data 108 togenerate a set of mobility genes 112. The mobility genes 112 may betransferred to various analyzers 114, 116, and 118 for subsequentanalysis.

The location data processor 110 may process the raw locationobservations into mobility genes 112, which may be sold or transferredto third parties who may perform various analyses. The mobility genes112 may be a condensed, succinct, and useful intermediate data formatthat may be consumed by third parties while keeping user anonymity. Inmany cases, the location data processor 110 may augment the raw locationdata with secondary data sources, as well as provide smoothing and otherprocessing that may increase data usefulness and, in some cases, improvedata accuracy.

The various mobility genes 112 may be a standardized mechanism by whichthird party data analyzers may access a very rich and very detailed setof location data 108. A location data processor 110 may analyze billionsof raw location observations and distill the data into mobility genes112 that may be easily consumed without the high data handling costs andhigh data processing costs of analyzing enormous numbers of locationobservations.

The mobility genes 112 may be an industrial standard format that maypreserve user anonymity yet may be increase the value of specific datathat may be used by third party analyzers. The mobility genes 112 maycome in many formats, including trajectories and visits.

The mobility genes 112 may come in historical and real time dataformats. A historical data format may include mobility genes that mayhave been derived over a relatively long period of time, such as a week,month, or year. A real time format may present mobility genes that maybe occurring currently, or over a relatively short period of time, suchas over a minute, hour, or day. Each use case and each system may have adifferent definition for “historical” and “real time.” For example, insome systems, real time may be mobility genes derived in the lastseveral seconds, while another system may define real time as datacollected in the last week.

Real time data formats may be useful for providing alerts, providingcurrent data, or making real time decisions about people's mobility. Oneuse for real time data may be to display traffic congestion on a road orto estimate travel time through a city. Another use of real time datamay be to predict the number of travelers that may be at a taxi stand inthe next several minutes or in the next hour.

Real time data formats may be used to compare current events tohistorical behaviors. Historical analysis may provide an estimate forevents that may happen today or some period in the future, and bycomparing historical estimates with real time data, an anomaly may bedetected or an estimate for future traffic may be increased or decreasedaccordingly.

FIG. 2 is a diagram of an embodiment 200 showing components that mayanalyze raw location data and provide mobility genes for subsequentanalyses. The example of embodiment 200 is merely one topology that maybe used to analyze raw location data.

The diagram of FIG. 2 illustrates functional components of a system. Insome cases, the component may be a hardware component, a softwarecomponent, or a combination of hardware and software. Some of thecomponents may be application level software, while other components maybe execution environment level components. In some cases, the connectionof one component to another may be a close connection where two or morecomponents are operating on a single hardware platform. In other cases,the connections may be made over network connections spanning longdistances. Each embodiment may use different hardware, software, andinterconnection architectures to achieve the functions described.

Embodiment 200 illustrates a device 202 that may have a hardwareplatform 204 and various software components. The device 202 asillustrated represents a conventional computing device, although otherembodiments may have different configurations, architectures, orcomponents.

In many embodiments, the device 202 may be a server computer. In someembodiments, the device 202 may still also be a desktop computer, laptopcomputer, netbook computer, tablet or slate computer, wireless handset,cellular telephone, game console or any other type of computing device.In some embodiments, the device 202 may be implemented on a cluster ofcomputing devices, which may be a group of physical or virtual machines.

The hardware platform 204 may include a processor 208, random accessmemory 210, and nonvolatile storage 212. The hardware platform 204 mayalso include a user interface 214 and network interface 216.

The random access memory 210 may be storage that contains data objectsand executable code that can be quickly accessed by the processors 208.In many embodiments, the random access memory 210 may have a high-speedbus connecting the memory 210 to the processors 208.

The nonvolatile storage 212 may be storage that persists after thedevice 202 is shut down. The nonvolatile storage 212 may be any type ofstorage device, including hard disk, solid state memory devices,magnetic tape, optical storage, or other type of storage. Thenonvolatile storage 212 may be read only or read/write capable. In someembodiments, the nonvolatile storage 212 may be cloud based, networkstorage, or other storage that may be accessed over a networkconnection.

The user interface 214 may be any type of hardware capable of displayingoutput and receiving input from a user. In many cases, the outputdisplay may be a graphical display monitor, although output devices mayinclude lights and other visual output, audio output, kinetic actuatoroutput, as well as other output devices. Conventional input devices mayinclude keyboards and pointing devices such as a mouse, stylus,trackball, or other pointing device. Other input devices may includevarious sensors, including biometric input devices, audio and videoinput devices, and other sensors.

The network interface 216 may be any type of connection to anothercomputer. In many embodiments, the network interface 216 may be a wiredEthernet connection. Other embodiments may include wired or wirelessconnections over various communication protocols.

The software components 206 may include an operating system 218 on whichvarious software components and services may operate.

A raw location receiver 220 may receive raw location data from one ormore networks 242 or other sources. The raw location receiver 220 mayhave a push or pull communication model with a raw location data source,and may receive real time or historical data for analysis. The rawlocation receiver 220 may store information in a raw location database222.

A batch analysis engine 224 or a real time analysis engine 226 may routethe raw location data 222 into various analyzers for processing. Theanalyzers may include a trajectory analyzer 228, a visit analyzer 230,and a statistics generator 232. The analysis may result in mobilitygenes 234, which may be served to various analyzers through a real timeanalysis portal 236 or a batch level analysis portal 238.

In the example of embodiment 200, a batch analysis engine 224 mayanalyze historical data to create historical mobility genes. The resultsof batch-level analysis may be available through a batch level analysisportal 238, where other analyzers may download and use mobility genes. Abatch-level analysis may be analyses that may not have a real-time usecase. For example, a commercial developer may wish to know thedemographics of people who travel near a commercial shopping mall. Suchan analysis may be performed in batch mode because the data may not bechanging rapidly.

A real time analysis engine 226 may perform real-time analysis oflocation observations, and may be tuned to process data quickly. In manycases, the real time analysis engine 226 may generate comparisonversions of a mobility gene. A comparison version may be a difference orcomparison between a set of real time observations and a predefined,historical mobility gene. This difference may be useful for generatingalerts, for example. In some cases, the difference information may bemuch more compact than having to access an entire set of mobility genes.

A trajectory analyzer 228 may create trajectories from raw location data222. The trajectories may include sequences of locations traveled by auser, including timestamps for each of the observed locations. Thetrajectories may be processed into a useable form by scrubbing andsmoothing the data, as well as removing duplicate or superfluousobservations.

A visit analyzer 230 may identify visits for a given location. In somecases, the visits may be inferred or determined from subsequent analysisof trajectories. In other cases, visits may be identified by finding alllocation observations for a given location, then finding data associatedwith those visits.

A statistics generator 232 may generate various statistics for a givenmobility gene. In some cases, the statistics generator 232 may accessvarious static data sources 256 or real time or dynamic data sources 258to augment a mobility gene.

The real time analysis portal 236 and batch level analysis portal 238may be a computer or web interface through which data may be queried andreceived. In a typical use case, a third party analyzer may send arequest to one of the portals 236 or 238 for a set of mobility genes.After verifying the requestor's credentials, the portal may cause thedata to be generated if the mobility genes have not been calculated,then the mobility genes may be transmitted to the requestor.

The system 202 may be connected to various other devices and servicesthrough a network 240.

One or more telecommunications networks 242 may supply raw location datato the system 202. The telecommunications networks 242 may be cellulartelephony networks, wireless data networks, networks of passive wirelesssniffers, or any other network that may supply location information.

In a typical network, a wireless mobile device 244, which may have aGlobal Positioning System (GPS) receiver 246, may connect to with atelecommunications network 248 through a series of access points.Various location data 250 may be generated from the mobile deviceinteractions, including GPS location data that may be generated by themobile device 244 and transmitted across the telecommunications network242.

The location data 250 may be cleaned and scrubbed with a data scrubber252 to provide raw location data 254 that may be processed by the system202. In many cases, the location data 250 may include device identifiersand other potentially personally identifiable information. The datascrubber 252 may replace device identifiers with other, non-traceableidentifiers and perform other pre-processing of the location data.

One form of telecommunications location data may include location datathat may be gathered from monitoring a device location in a cellulartelephony system. In some such systems, the location data may includethe location coordinates of an access point, which may be close to butnot exactly the location of the device. Some cellular networks may havecells that span large distances, such as multiple kilometers or miles,and the accuracy of the location information may be very poor. Othertelecommunications systems may use triangulation between two, three, ormore access points to determine location with a higher degree ofaccuracy.

In some cases, a GPS receiver in a mobile device may generatecoordinates and may transmit the coordinates as part of a data messagefrom the mobile device 244. Such GPS coordinates may be much higheraccuracy than other location mechanisms, but GPS coordinates may not betransmitted with as often as other location mechanisms. In some systems,some location observations may have different degrees of accuracy, suchthat some observations may be generated by GPS and other observationsmay be determined through triangulation or merely access pointlocations. Such accuracy differences may be used during mobility genecalculations.

Static data sources 256 and dynamic data sources 258 may represent anytype of supplemental data sources that may be used to generate mobilitygenes. An example of a static data source 256 may be a map of highways,roads, train systems, bus systems, pedestrian paths, bicycle paths, andother transportation routes. Another example may be the name andlocation of various places of interests, such as shopping malls, parks,stores, train stations, bus stops, restaurants, housing districts,factories, offices, and other physical locations.

Another set of static data sources 256 may be demographic informationabout people. Such information may be known by a telecommunicationsnetwork 242 because the network may have name, address, credit card, andother information about each of its subscribers. In some cases, atelecommunications network 242 may augment its raw location data 254with demographic information.

An example of dynamic data sources 258 may be current train, bus,airplane, or ferry schedule, the current number of taxis available, orany other data source.

The static and dynamic data sources 256 and 258 may augment a mobilitygene. For example, a data analyzer may request mobility gene informationfor fast food restaurants in a specific city. The system 202 mayidentify each of the fast food restaurants from a secondary data source,the identify visits and trajectories that may relate to each of the fastfood restaurants.

A set of data consumers 260 may be third party organizations that mayconsume the mobility gene data. The data consumers 260 may have ahardware platform 260 on which various analysis applications 262 mayexecute. In some cases, the data consumers 260 may be third partyservices that may consume the mobility genes and provide location-basedservices, such as traffic monitoring and a host of other services.

FIG. 3 is a flowchart illustration of an embodiment 300 showing a methodof generating location observations. Embodiment 300 is a simplifiedexample for a sequence of generating location observations that may beperformed by a telecommunications network.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 300 illustrates two ways of determining a locationobservation, along with a way to scrub the observations fromdevice-specific identifiers.

One way to create a location observation may be to detect a device onthe network in block 302. A location for the device may be determined inblock 304, along with a timestamp in block 306. The resultant locationobservation may be stored in block 308.

Each location may be determined by the network. In some cases, a networkmay establish an approximate location for the device, which may besufficient for managing the traffic on the network. However, in manycases, such location coordinates may be inaccurate. For example, somenetworks may provide a location as the location of the access point,cell tower, or other fixed node on the network. Any device detected bythat node may be located anywhere within the range of the access point,which may be several kilometers or miles. Such location information mayhave a large tolerance or variation from the actual location.

Some networks may provide a location estimate based on triangulation ofa device with two, three, or more access points or other receivers. Sucha location may be more accurate than the example of providing merely theaccess point physical location, but may not be as accurate as GPSlocation.

In block 310, a network may detect that GPS location information may betransmitted over the network. Such information may be captured, atimestamp generated in block 312, and a location observation may bestored in block 314. Such an example may be one method by which GPSinformation may be captured and stored as a location information.

In some systems, certain applications may execute on a device and maygenerate GPS location information. For example, navigation applicationstypically send a stream of GPS location data to a server, which mayupdate directions for a user. Such applications may be detected, and theGPS locations may be used as highly accurate location observations.

A typical location observation may include a device identifier, a set oflocation coordinates, and a timestamp. The device identifier used in awireless network may depend on the network. Typically, a device may havesome type of electronic identification, such as a Media Access Control(MAC) address, Electronic Identification Number (EIN), or other deviceidentifier. In many cases, such identifiers may be a mechanism by whichother systems may also identify the device.

A device identifier may be one mechanism by which a mobility gene may bedirectly linked to a specific user. In general, the raw data formobility genes may be collected by one group of actors who may havestrict privacy regulations to which they have to adhere, but may sellmobility genes to a third party. A device identifier may be one way thata third party may connect specific mobility data to specific users.

In order to obfuscate identifiable information from the locationobservations, each observation may be analyzed in block 316, and aunique identifier for the device may be generated in block 318 andsubstituted for the actual device identifier in block 320. The locationobservation may be updated in block 322.

The unique identifier may be the same identifier for that device in theparticular dataset being analyzed. In some cases, a lookup table may becreated that may have the device identifier and its unique replacement.Such a system may use the same substituted device identifier forobservations over a long period of time.

After updating all of the observations, the updates may be sent to amobility gene analyzer in block 324.

FIG. 4 is a flowchart illustration of an embodiment 400 showinginteractions between a mobility gene provider 402 and a data consumer404. The operations of the mobility gene provider 402 are illustrated inthe left hand column, while the operations of the data consumer 404 areillustrated in the right hand column.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 400 is one method by which a mobility gene may be requestedand provided. A mobility gene provider 402 may be a system that mayprocess raw location observations into a set of mobility genes. Themobility genes may be consumed by the data consumer 404. In manysituations, the mobility genes may be a compact form of locationobservations that may be ready for further processing by a data consumer404.

The mobility genes may represent many thousands, millions, billions, oreven trillions of individual observations that may be condensed intovarious mobility genes. By pre-processing the location observations intoa set of mobility genes, the high cost and complexity of analyzingenormous numbers of observations may be avoided. Further, a set ofmobility genes may be anonymized or summarized such that the data may behandled without worry of disclosing personally identifiable information.Such restrictions may be imposed by law or convention, and the cost ofimplementing the restrictions may be borne by the mobility gene provider402 and may not be passed to the data consumer 404.

In the example of embodiment 400, a data consumer 404 may define amobility gene in block 406, then transmit that definition in block 408to the mobility gene provider 402.

The mobility gene provider 402 may receive the definition in block 410,analyze raw location data in block 412, and create the mobility genes inblock 414 and store the mobility genes in block 416.

In many cases, the mobility gene may be processed from historical data.Such mobility genes may be processed in a batch mode. Some requests maybe for real time data, and such mobility genes may be continuallyprocessed and updated.

In the example of embodiment 400, a data consumer 404 may request datain block 418, which may be received in block 420 by the mobility geneprovider 402 in block 422. The mobility gene provider 402 may transmitthe mobility genes in block 422, which may be received by the dataconsumer in block 424. The mobility genes may be analyzed in block 426to provide various location based services in block 428.

The example of embodiment 400 in blocks 418-428 may be one example of apull-style communication protocol, where the data consumer 404 mayinitiate a request. Other systems may use a push-style communicationprotocol, where the mobility gene provider 402 may initiate a datatransfer. Still other systems may use other types of communicationprotocols for transferring mobility genes from a mobility gene provider402 to a data consumer 404.

FIG. 5 is a flowchart illustration of an embodiment 500 showinginteractions between a mobility gene provider 502 and a data consumer504. The operations of the mobility gene provider 502 are illustrated inthe left hand column, while the operations of the data consumer 504 areillustrated in the right hand column.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 500 is an example of an interaction where a data consumer 504may use a standard, pre-computed mobility gene. A mobility gene provider502 may analyze raw location data in block 506, create a standardizedset of mobility genes in block 508, and store the mobility genes inblock 510. Such a process may loop over and over as new data may bereceived.

A standardized set of mobility genes may be pre-defined and may be readyto use. One form of such genes may be a subscription service or a datamarketplace, where many different data consumers 504 may purchase orconsume a pre-defined set of mobility genes.

Such a system may compare with the example of embodiment 400, where adata consumer may define various parameters about a requested mobilitygene.

A data consumer 504 may determine a standard mobility gene for anapplication in block 512. In many cases, a mobility gene provider 502may provide a catalog of mobility genes that may be useful for variousapplications. Such mobility genes may be standardized and may be offeredon a subscription or other basis to one or more data consumers.

The data consumer 504 may request mobility genes in block 514, and therequest may be received in block 516 by the mobility gene provider 502.The mobility genes may be transmitted in block 518 and received in block520. A data consumer 504 may analyze the mobility genes in block 522 andprovide a location based service in block 524.

FIG. 6 is a flowchart illustration of an embodiment 600 showing a methodfor creating trajectory mobility genes. The method of embodiment 600 maybe merely one example of how trajectories may be created from rawlocation observations.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 600 is one example of how trajectory mobility genes may begenerated. A trajectory gene may define a path that a user may havetraveled. In many cases, a trajectory gene may include a transportationmode.

Trajectory genes may be smoothed. In many cases, location observationsmay not be very precise. For example, some raw location data may give auser's location as the location of an access point, which may be a largedistance from the actual location. In some cases, such variation may beon the order of tens or hundreds of feet, or in some cases miles orkilometers of inaccuracies.

A smoothing algorithm may adjust a trajectory such that the movement maymake physical sense. Some such smoothing algorithms may increase atrajectory's accuracy.

Some smoothing or post processing algorithms may adjust a trajectory aspart of an anonymizing process. Trajectories can contain informationthat may identify people specifically. For example, a trajectory from aperson's home address to their work address may indicate exactly who theperson may be. By obfuscating one or both of the origin or destination,the trajectory may be made anonymous, while preserving useful portionsof the trajectory for analysis.

Many mobility genes may include demographic information about a user.The demographic information may be any type of descriptor orcategorization of the user. Many systems may classify users by gender,age or age group, income, race, education, and so on. Some systems mayinclude demographics that may be derived from location observation data,such as predominant mode of transport, recreational sites visited, typesof restaurants visited, and the like.

Raw location observations may be received in block 602.

A timeframe of interest may be determined in block 604. In someanalyses, a time frame may be defined by trajectories in the last hour,day, or week. In other analyses, a time frame may be defined bytrajectories at a specific recurring time, such as between 9:15-9:30 amon Tuesdays that are not holidays. Location observations meeting thetimeframe of interest may be gathered for the analysis.

The observations may be sorted by device identification in block 606.For each device identification in block 608, a subset of observationsmay be retrieved in block 610 that have the device identification. Thesubset may be sorted by timestamp in block 612 and a raw trajectory maybe created by the sequence of location observations in block 614.

For each sequence in block 616, the trajectory may be broken intosegments based on the trajectory speed in block 618. In other words, atrajectory segment may be created by identifying locations where thetrajectory may have paused for an extended time. An example may be atrajectory that may pause while a person is at work, at home, at arecreational event, or visiting some location.

For each segment in block 620, a transportation mode may be determinedin block 622 and an average speed determined in block 624. Thetransportation mode may be inferred by the specifics of a trajectory.For example, a person who progresses slowly at a walking pace to a trainstation, then moves quickly at a train's speed may be assumed to havewalked to the train station and ridden a train. Another person wholingers at a bus stop for a period of time, then travels at a commonspeed of vehicular traffic may be assumed to be riding a bus. Yetanother person who travels on a motorway but begins and ends a journeyaway from bus stops may be assumed to travel by car or taxi.

In some embodiments, a user's previous history may be used as anindicator for their preferred transportation mode. Some systems may lookback to previous transportation analyses for hints or indicators aswhether a specific user often uses a car or train.

The following several steps may be one way to smooth the trajectory and,in some cases, increase its accuracy. Some location observations mayhave positional data that may be highly inaccurate. The inaccuracies maycome from the method used to determine a user's location, which mayinclude giving only the coordinates of an access point or cell tower,even though the user may be a long distance away from the access pointor cell tower. In such cases, the trajectory information may giveunrealistic movements, such as lingering for a period of time at oneaccess point, then instantaneously moving a long distance to a secondaccess point. Such movements are not physically possible, so bysmoothing the trajectory, the trajectory may become more accurate andmore useful for further analyses.

Once a transportation mode is determined in block 622, an average speedmay be determined in block 624. The average speed may be calculated fromthe end points of a trajectory segment.

A baseline speed range for the travel segment may be determined fromhistorical data in block 626. The baseline speed may be used as acomparison to determine whether the observed speeds appear appropriate.For each observation in block 628, a speed comparison may be made inblock 630. If the speed appears appropriate in block 630, no changes maybe made. If the speed does not appear to be appropriate in block 630,the observed location may be adjusted in block 632 to meet the speedlimits determined from the historical data.

After analyzing each segment in block 620, descriptors may be added toeach segment in block 634. The descriptors may include transportationmode, averages speed, and other metadata. Demographic information may beadded in block 636 describing the user.

After analyzing each sequence in block 616, the trajectories may bestored in block 638.

FIG. 7 is a flowchart illustration of an embodiment 700 showing a methodfor preparing trajectory mobility genes for transmittal. The method ofembodiment 700 may be merely one example of how trajectories may beprepared for use.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 700 may illustrate one method by which a request fortrajectory mobility genes may be fulfilled. The fulfillment method mayensure that there may be a sufficient number of trajectories such thatindividual trajectories may not be separately identifiable. In somecases, the trajectories may also be obfuscated.

A request for trajectory genes may be received in block 702.

The request may define a physical area of interest in block 704. Thephysical area of interest may be a specific physical location, such aspeople traveling along a highway or people traveling towards a sportingevent. In some cases, the physical area of interest may be a category,such as people going out to eat, where the category may define thedestination as any restaurant.

A time frame of interest may be defined in block 704. The number ofavailable trajectories that meet the physical location and time framecriteria may be determined in block 706. If the number is below apredefined minimum number of trajectories in block 708, the searchparameters may be adjusted in block 710 to include additionaltrajectories.

The minimum number of trajectories may be selected for any of manyreasons. In some cases, a minimum number of trajectories may allow amobility gene to anonymize the data such that a single trajectory maynot be individually identified. In many cases, a summarized demographicprofile may be provided with the trajectories, and when a low number oftrajectories may be provided, it may be possible to single out atrajectory as possibly belonging to an outlier in the demographicprofile.

Another reason for using a minimum number of trajectories may be toensure relatively accurate subsequent analyses. A small set oftrajectories may give highly skewed results in some cases, and by havinglarger datasets, more meaningful results may be calculated with higherconfidence intervals.

The trajectories meeting the criteria may be retrieved in block 714. Foreach trajectory in block 716, the trajectory origins or destinations maybe obfuscated in block 718, and demographic data may be collected inblock 720.

The obfuscation of the trajectory may be accomplished in severaldifferent methods. One way to obfuscate a trajectory may be bytruncating a trajectory. One use case may be to use trajectories todetermine the density of riders on a subway system. The density may bederived from the number of trajectories from one train station to thenext, but the analyses does not need to include origin and destination.By truncating the trajectories to just the portion from one trainstation to the next, anonymity may be preserved.

One way to obfuscate a trajectory may be to summarize an origin ordestination. A person may be personally identified when that personbegins or ends their journey from their home address. In such cases, atrajectory may be anonymized by using a centralized location as asubstitute for a home address. For example, a centralized location in ahousing district may be substituted for a user's home address in theirtrajectory. Such a substitution may be made with a work address or someother origin or destination.

Another way to obfuscate a trajectory may be to truncate a trajectory ata common location near the origin or destination. For example, a personwhy may travel by subway to their home may have their trajectorytruncated at the train station where they alight.

After analyzing all of the trajectories in block 716, the demographicdata may be summarized for the group of trajectories in block 722. Themobility genes may be transmitted in block 724.

FIG. 8 is a flowchart illustration of an embodiment 800 showing a methodfor creating visit mobility genes from trajectory genes. The method ofembodiment 800 may be merely one example of how visit genes may becreated.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 800 may be one example of how to create a visit mobilitygene. A visit mobility gene may give various information and statisticsabout people's visits to certain locations. In some cases, a dataconsumer may wish to find information about people's visits to aspecific location, such as a shopping mall, recreational venue, aspecific coffee shop, or other location.

In other cases, a data consumer may wish to find information aboutpeople's visits to certain classes of locations, such as fast foodrestaurants, grocery stores, or some other category.

Embodiment 800 may be one way to identify visits from trajectories. Inthis method, places where a person's trajectory pauses or remains withina certain area may be considered visits. Once a visit may be identified,the visit may be matched to a known physical location, then the visitmay be classified, and demographics may be added.

The operations of embodiment 800 may be an example of an analysis thatmay be performed any time a trajectory may be generated. In somesystems, trajectory mobility genes may be constantly generated fromrecently generated data. As each trajectory may be created, a visitanalysis such as embodiment 800 may be performed to identify, classify,and store visits in a database.

Trajectories may be received in block 802. For each trajectory in block804, a period of little movement may be identified in block 806. Theperiod of little movement may be analyzed in block 808 to determine alength of visit. If the visit does not exceed a minimum threshold inblock 810, the visit may be ignored in block 812.

When the visit exceeds a threshold in block 810, an attempt may be madeto identify home or work location in block 814. The home or worklocation of a person may be visited very frequently, typically everyday.

The home and work location of a person may be a special category oflocations for several reasons. For example, many movement studies mayinvolve people's movements to and from work or home. As another example,home and work locations may be a way to identify a trajectory asbelonging to a specific person.

If a match for home or work is made in block 816, the visit may bemarked as home or work in block 818. When the visit is not to home orwork, an attempt may be made in block 820 to match the visit to a knownlocation. If there is a match in block 822, the visit may be marked withthe location in block 824.

The matching in block 820 may be to attempt to match a visit to abusiness, organization, physical feature such as a park, or some othermetadata about a location. Such metadata may enrich the data stored fora visit. For example, a visit near a grocery store that takes 20 minutesor so may be classified as a visit to the grocery store. Such grocerystore visits may be searched and aggregated into a visit mobility genefor further analysis.

The visit type and duration may be classified in block 826 anddemographic information may be added in block 828. The visit mobilitygene information may be stored in block 830.

FIG. 9 is a flowchart illustration of an embodiment 900 showing a secondmethod for creating visit mobility genes. The method of embodiment 900may be merely one example of how visit mobility genes may be createdfrom raw location observations.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

Embodiment 900 may be another way of identifying and classifying visitsas part of a visit mobility gene. In this method, a set of locations isgiven, and the raw observation data may be searched to find occasionswhere the location was visited. From these data points, various aspectsof a visit mobility gene may be derived.

Raw location observations may be received in block 902, as well as a setof locations of interest in block 904.

For each location of interest in block 906, raw location observationsmeeting the location criteria may be found in block 908. The useridentifications for those observations may be found in block 910.

For each user identification in block 912, a length of stay may bedetermined in block 914. If the stay does not exceed a minimum value inblock 916, the visit may be ignored in block 918.

When the visit does exceed the minimum value in block 916, thedemographic information about the user may be gathered in block 918.

An inbound trajectory may be calculated in block 920 and an outboundtrajectory may be determined in block 922. The inbound and outboundtrajectories may be useful to help understand visitor's movements beforeand after the visit.

In some cases, the visit information may be anonymized. For example,inbound and outbound trajectories may be truncated or otherwiseobfuscated. The visit data may be stored in block 928.

FIG. 10 is a diagram illustration of an embodiment 1000 showing a methodfor determining a trajectory pathway. Embodiment 1000 illustrates highlevel steps for calculating a movement trajectory that may be mapped tophysical thoroughfares.

A physical map 1002 may contain a street map 1004, which may showvarious roads and highways 1006.

From the physical map 1002, a graph 1008 of transport pathways may begenerated. The graph 1008 may be a description of the physical map 1002that may be interpreted and analyzed by a computer. In many cases, thegraph 1008 may have edges representing segments of thoroughfares, andthe nodes may be intersections between the segments. The graph may be acomputer-readable structure that may be traversed and analyzed by acomputer when attempting to determine a user's physical path through themap 1002.

For each time period in a trajectory segment, candidate locations may bedetermined 1010. A location 1012 may be received from atelecommunications network, which may be represented by severalcandidate locations 1014. The location 1012 may have an accuracy ortolerance, which may be used to select several candidate locations 1014.The trajectory path may be calculated 1016 by finding an optimized route1018 through the sequence of candidate locations.

FIG. 11 is a diagram illustration of an embodiment 1100 showing acalculated trajectory 1112 generated from a sequence of locations.

A map 1102 may illustrate the position of several locations 1104, 1106,1108, and 1110. Each of the locations 1104, 1106, 1108, and 1110 mayrepresent individual positions captured by a telecommunications networkin successive time intervals. From these positions, a calculatedtrajectory 1112 may be generated that maps to the user traversing thehighway 1114.

The locations 1104, 1106, 1108, and 1110 may be located away from thehighway 1114. Such inaccuracies may arise from the inaccuracies of thelocation data that may be provided by a telecommunications network.

A telecommunications network may provide location data through severaldifferent mechanisms. Some networks may capture Global PositioningSystem (GPS) data that may originate at a user's device, then may betransmitted to the telecommunications network. Such GPS data may tend tobe more accurate than other forms of location data. In such systems,each time period may have a much smaller set of candidate physicallocations for analysis than with other location data.

Some networks may not capture data generated at a user device, but maycapture data that may be detected through the network itself. In thecoarsest type of location data, a network may merely capture thelocation of the tower to which a user device may be connected. The towerlocation may cover individual cells that may be many kilometers in size,yielding highly inaccurate location data. Other networks may havevarious methods of triangulating a user's location using signals fromtwo, three, or more towers.

With each type of location data, different levels of inaccuracies ortolerance may be assumed. For GPS-generated location data, theaccuracies may be relatively high. In such cases, a user's candidatepositions for a given set of location coordinates may be tightly focusedaround the coordinates. Some systems may assign a probability factor forcandidate positions closest to the coordinate locations, with a higherprobability factor allocated for closer candidate positions and lowerprobability factors for further candidates.

For location data that may identify merely a tower location or a cellserviced by a tower, the candidate locations may be any location withinthe area serviced by the tower or the cell. In some cases, each of thecandidate locations may be assigned the same probability.

From the analysis of embodiment 1100, the various locations 1104, 1106,1108, and 1110 may represent cell towers or cells in which the user mayhave been traveling, yet the most likely calculated trajectory 1112 maybe along the highway 1114. Such a situation often occurs in trajectoriesgenerated from telecommunications networks.

A calculated trajectory 1112 may be useful for further analysis. Forexample, traffic density and speeds along the highway 1114 may bemeasured. Such analysis may not have been previously possible with thehighly inaccurate and spare data that may come from a telecommunicationsnetwork.

FIG. 12 is a diagram of an embodiment 1200 showing components that mayanalyze raw location data and provide analyzed trajectories forsubsequent analyses. The example of embodiment 1200 is merely onetopology that may be used to analyze raw location data.

The diagram of FIG. 12 illustrates functional components of a system. Insome cases, the component may be a hardware component, a softwarecomponent, or a combination of hardware and software. Some of thecomponents may be application level software, while other components maybe execution environment level components. In some cases, the connectionof one component to another may be a close connection where two or morecomponents are operating on a single hardware platform. In other cases,the connections may be made over network connections spanning longdistances. Each embodiment may use different hardware, software, andinterconnection architectures to achieve the functions described.

Embodiment 1200 illustrates a device 1202 that may have a hardwareplatform 204 and various software components. The device 1202 asillustrated represents a conventional computing device, although otherembodiments may have different configurations, architectures, orcomponents.

In many embodiments, the device 1202 may be a server computer. In someembodiments, the device 1202 may still also be a desktop computer,laptop computer, netbook computer, tablet or slate computer, wirelesshandset, cellular telephone, game console or any other type of computingdevice. In some embodiments, the device 1202 may be implemented on acluster of computing devices, which may be a group of physical orvirtual machines.

The hardware platform 1204 may include a processor 1208, random accessmemory 1210, and nonvolatile storage 1212. The hardware platform 1204may also include a user interface 1214 and network interface 1216.

The random access memory 1210 may be storage that contains data objectsand executable code that can be quickly accessed by the processors 1208.In many embodiments, the random access memory 1210 may have a high-speedbus connecting the memory 1210 to the processors 1208.

The nonvolatile storage 1212 may be storage that persists after thedevice 1202 is shut down. The nonvolatile storage 1212 may be any typeof storage device, including hard disk, solid state memory devices,magnetic tape, optical storage, or other type of storage. Thenonvolatile storage 1212 may be read only or read/write capable. In someembodiments, the nonvolatile storage 1212 may be cloud based, networkstorage, or other storage that may be accessed over a networkconnection.

The user interface 1214 may be any type of hardware capable ofdisplaying output and receiving input from a user. In many cases, theoutput display may be a graphical display monitor, although outputdevices may include lights and other visual output, audio output,kinetic actuator output, as well as other output devices. Conventionalinput devices may include keyboards and pointing devices such as amouse, stylus, trackball, or other pointing device. Other input devicesmay include various sensors, including biometric input devices, audioand video input devices, and other sensors.

The network interface 1216 may be any type of connection to anothercomputer. In many embodiments, the network interface 1216 may be a wiredEthernet connection. Other embodiments may include wired or wirelessconnections over various communication protocols.

The software components 1206 may include an operating system 1218 onwhich various software components and services may operate.

A map processor 1220 may create graphs from a map of a physicaltransportation system. The graphs may be stored in a graph database 1222and may be used by a map matcher 1224 to associate a location trajectoryto a set of physical locations. In many cases, a map matcher 1224 maygenerate a sequence of roads, highways, pathways, train lines, or otherthoroughfares that may correspond with user trajectories. The mapprocessor 1220 may retrieve various maps from a remote map database1234, which may be accessible over a network 1232.

The graphs may be computer-searchable representations of maps ofphysical transportation networks. A graph may be created for differentmodes of transportation, such as a graph of a train network, a busnetwork, a road system, a ferry system, a bicycle path network,pedestrian walkways, and any other transportation mode. The graphs mayrepresent the physical world by matching the graph nodes tointersections and the graph edges to a thoroughfare. Once represented asa graph, the map matcher 1224 may be able to find a sequence of physicalpositions that correspond with locations that may have been observed fora device.

The map matcher 1224 may attempt to find a physically logical sequencefor a trajectory. The physically logical sequence may involve finding apath that makes sense based on the speed, mode of transportation, orother factors in the data. For example, a trajectory that results in auser moving at three times the speed limit of a side street may beimpractical or impossible, so the sequence may be recomputed with atrajectory that traverses a highway with a much faster speed limit.

A trajectory generator 1226 may generate a trajectory from a sequence oflocation data. The location data may come from a telecommunicationsnetwork 1236, which may provide a set of device locations 1238. Thedevice locations 1238 may be constructed into trajectories, which maycontain a sequence of location coordinates for individual devices. Thelocation coordinates may be timestamped so that the coordinates may bearranged by time sequence.

The trajectory generator 1226 may store trajectories in a trajectorydatabase 1228. A map matcher 1224 may analyze trajectories 1228 tocreate analyzed trajectories 1230. The analyzed trajectories may includea sequence of thoroughfares traveled by a user.

A user device 1240 may represent a device that may operate within atelecommunications network or other network and from which a sequence oflocations may be generated. A typical user device 1240 may be a cellulartelephone, but other devices may be portable laptop computers, tabletcomputers, wearable computer devices, or any other mobile device thatmay operate on any type of hardware platform 1242. In some cases, thedevice may have an internal location detector 1244, which may generate alocation history 1246. The location history 1246 may be used by atrajectory generator 1226 to create trajectories.

In some cases, the user device 1240 may be recognized and tracked by atelecommunications network 1236 without the device having a locationdetector 1244. In such a mode, the telecommunications network 1236 maydetect a device and its location by identifying the device within thenetwork 1236.

FIG. 13 is a flowchart illustration of an embodiment 1300 showing amethod for creating transportation graphs. The method of embodiment 1300is merely one example of how to convert a geographic map to a graph thatmay be used for analyzing trajectories.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

A map may be retrieved in block 1302. Thoroughfares may be identified inblock 1304, along with intersections of the thoroughfares in block 1306.For each intersection in block 1308, a node may be created in in thegraph in block 1310. For each thoroughfare in block 1312, a graph edgemay be created in block 1314.

Various characteristics of the thoroughfare may be determined in block1316 and stored with the edge in block 1318. Characteristics may includeinformation such as the speed limit, direction of travel, transportationmodes, as well as distance and other information.

The graph may be stored in block 1320. In many cases, a different graphmay be created for each mode of transportation. For example, a graph maybe created for pedestrian travel, and separate graphs for travel by car,bus, train, ferry, bicycle, or other transportation mode.

FIG. 14 is a flowchart illustration of an embodiment 1400 showing amethod for analyzing trajectory paths. The method of embodiment 1400 maybe merely one example of how to map a set of location observations toactual, physical locations that a user may have traversed in theirtrajectory.

Other embodiments may use different sequencing, additional or fewersteps, and different nomenclature or terminology to accomplish similarfunctions. In some embodiments, various operations or set of operationsmay be performed in parallel with other operations, either in asynchronous or asynchronous manner. The steps selected here were chosento illustrate some principals of operations in a simplified form.

A trajectory may be received in block 1401. From the trajectory, atransportation mode may be determined in block 1402, and a correspondingtransportation graph may be retrieved in block 1404.

For each element in the trajectory segment in block 1406, a set ofcandidate physical locations may be determined in block 1408. Weightsmay be applied to each candidate location in block 1410.

For some systems, a weighted probability of the candidate locations maybe applied. For example, some systems may apply a probability factor forcandidate locations that may be a high probability near the locationcoordinates with a lower probability for candidate locations furtheraway from the location coordinates. Such a system may be used whenanalyzing GPS location data.

The candidate locations may be stored in a time sequence in block 1412.

A path through the candidate locations may be generated in block 1414.The path may be generated by finding an optimized route through thecandidate locations, such as optimizing by time, distance, minimumnumber of turns, or some other optimization function.

Once the candidate locations may be coalesced into a path in block 1414,a sequence of thoroughfares may be determined in block 1416. Thesequence may be analyzed in block 1418 for inconsistencies. Theinconsistencies may be items such as large changes in speed, excessivespeed for certain thoroughfares, large directional changes, or someother factor that may be inconsistent with basic physics or otherwiseunlikely to occur.

For each inconsistency in block 1420, a determination may be madewhether the inconsistency may be physically impossible in block 1422. Ifthe inconsistency is physically impossible in block 1424, the candidatelocations causing the impossibility may be removed from consideration inblock 1426. If the inconsistency may be physically possible in block1424, the candidate points may be de-emphasized in block 1428. Onemechanism for de-emphasizing may be to give the candidate location alower probability, for example.

After analyzing the inconsistencies in block 1420, if there are existinginconsistencies in block 1430, the process may return to block 1414 tore-calculate a path. Once the inconsistencies have been eliminated inblock 1430, intermediate locations may be interpolated in to thetrajectory path in block 1432 and stored as part of a trajectory segmentin block 1434.

FIG. 15 is a diagram illustration of an embodiment 1500 showing atheoretical view of a location analysis.

Three different time periods may be illustrated as time T 1502, time T+11504, and time T+2 1506. Time T 1502 may have a set of candidatelocations 1508, time T+1 1504 may have a set of candidate locations1510, and time T+2 1506 may have a set of candidate locations 1514.

A minimizing function or other optimization technique may be used tofind an optimum path 1514 through the candidate locations. The optimumpath 1514 may represent the best fit of a path through the candidatelocations for each time period. Once the optimum path 1514 may bedetermined and verified for any inconsistencies, path 1514 may representthe most likely path that a user may have traversed for the data in atrajectory.

The foregoing description of the subject matter has been presented forpurposes of illustration and description. It is not intended to beexhaustive or to limit the subject matter to the precise form disclosed,and other modifications and variations may be possible in light of theabove teachings. The embodiment was chosen and described in order tobest explain the principles of the invention and its practicalapplication to thereby enable others skilled in the art to best utilizethe invention in various embodiments and various modifications as aresuited to the particular use contemplated. It is intended that theappended claims be construed to include other alternative embodimentsexcept insofar as limited by the prior art.

What is claimed is:
 1. A method performed on at least one computerprocessor, said method comprising: receiving a trajectory segmentcomprising a sequence of locations for a device, said locationscomprising a timestamp and a coordinate location, said locations furthercomprising an accuracy; for each of said sequence of locations, mappinga plurality of physical locations within said coordinate location andsaid accuracy; determining a transportation route comprising a physicallocation for each of said sequence of locations.
 2. The method of claim1, said mapping comprising determining a plurality of geophysicallocations within said accuracy of said coordinate location, saidgeophysical locations being defined in a transportation graph.
 3. Themethod of claim 2, said transportation graph comprising a road system.4. The method of claim 2, said transportation graph comprising a railwaysystem.
 5. The method of claim 2, said transportation graph comprising abus system.
 6. The method of claim 2, said transportation graphcomprising a ferry system.
 7. The method of claim 2, said transportationgraph comprising a plurality of nodes and edges, at least one of saidnodes being an intersection.
 8. The method of claim 1 furthercomprising: identifying a trajectory comprising a start point and endpoint, said trajectory being comprised of said sequence of locations;determining said trajectory segment from said trajectory, saidtrajectory segment being a portion of said trajectory being traveled ona first transportation mode.
 9. The method of claim 8, saidtransportation route comprising a sequence of passageways within saidtransportation graph.
 10. The method of claim 9 further comprising:determining a first speed between a first location at a first time and asecond location at a second time; and interpolating at least oneintermediate location at an intermediate time between said firstlocation at said first time and said second location at said secondtime.
 11. A system comprising: at least one processor; a map matchinganalyzer operating on said at least one processor and configured toperform a method comprising: receiving a trajectory segment comprising asequence of locations for a device, said locations comprising atimestamp and a coordinate location, said locations further comprisingan accuracy; for each of said sequence of locations, mapping a pluralityof physical locations within said coordinate location and said accuracy;determining a transportation route comprising a physical location foreach of said sequence of locations.
 12. The system of claim 11, saidmapping comprising determining a plurality of geophysical locationswithin said accuracy of said coordinate location, said geophysicallocations being defined in a transportation graph.
 13. The system ofclaim 12, said transportation graph comprising a road system.
 14. Thesystem of claim 12, said transportation graph comprising a railwaysystem.
 15. The system of claim 12, said transportation graph comprisinga bus system.
 16. The system of claim 12, said transportation graphcomprising a ferry system.
 17. The system of claim 12, saidtransportation graph comprising a plurality of nodes and edges, at leastone of said nodes being an intersection.
 18. The system of claim 11,said method further comprising: identifying a trajectory comprising astart point and end point, said trajectory being comprised of saidsequence of locations; determining said trajectory segment from saidtrajectory, said trajectory segment being a portion of said trajectorybeing traveled on a first transportation mode.
 19. The system of claim18, said transportation route comprising a sequence of passagewayswithin said transportation graph.
 20. The system of claim 19, saidmethod further comprising: determining a first speed between a firstlocation at a first time and a second location at a second time; andinterpolating at least one intermediate location at an intermediate timebetween said first location at said first time and said second locationat said second time.